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Abstract 

In (Hazan and Kale, 2008), the authors showed that the regret of the FoUow the Regu- 
larized Leader (FTRL) algorithm for online linear optimization can be bounded by the total 

■ variation of the cost vectors. In this paper, we extend this result to general online convex 
^ ^ . optimization. We first analyze the limitations of the FTRL algorithm in (Hazan and Kale, 

2008) when applied to online convex optimization, and extend the definition of variation 
. to a sequential variation which is shown to be a lower bound of the total variation. We 

then present two novel algorithms that bound the regret by the sequential variation of cost 
. functions. Unlike previous approaches that maintain a single sequence of solutions, the 

^ ■ proposed algorithms maintain two sequences of solutions that makes it possible to achieve 

. a variation-based regret bound for online convex optimization. 

I Keywords: online convex optimization, regret bound, variation, bandit 

1. Introduction 

We consider the general online convex optimization problem (Zinkevich, 2003) which pro- 
ceeds in trials. At each trial, the learner is asked to predict the decision vector that be- 
longs to a bounded closed convex set 7^ C R'^; it then receives a cost function ct(-) : — M 
^ ■ and incurs a cost of ct(xt). The goal of online convex optimization is to come up with a 

H , sequence of solutions xi, . . . , that minimizes the regret, which is defined as the difference 

in the cost of the sequence of decisions accumulated up to the trial T made by the learner 
and the cost of the best fixed decision in hindsight, i.e. 



regret = ^ ct(xt) - min^ q(x). 



t=i t=i 

In a special case, when the cost functions are linear ct(x) = f^^x, the problem becomes 
the online linear optimization. The goal of online convex optimization is to design al- 
gorithms that predict, with a small regret, the solution x^ at the tth trial given the 
(partial) knowledge about the past cost functions Cr{-),T = 1, ■ ■ ■ ,t — 1. Many algo- 
rithms have been proposed for online convex optimization, especially for online linear 
optimization. Zinkevich (2003) proposed a gradient descent algorithm for online con- 
vex optimization with a regret bound of 0{VT). When cost functions are strongly con- 
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vex, the regret bound of the online gradient descent algorithm is reduced to 0(log(T)) 
with appropriately chosen step size (Hazan et al., 2007), and to 0(1) by a more recent 
work (Hazan and Kale, 2011). Another common methodology for online convex optimiza- 
tion, especially for online linear optimization, is based on the framework of Follow the 
Leader (FTL) (Kalai and Vempala, 2005). FTL chooses by minimizing the cost in- 
curred by xt in all previous trials. Since the naive FTL algorithm fails to achieve a 
sublinear regret in the worst case, many variants have been developed to fix the prob- 
lem, including Follow the Perturbed Leader (FTPL) (Kalai and Vempala, 2005), Follow 
the Regularized Leader (FTRL) (Abernethy et al., 2008), and Follow the Approximate 
Leader (FTAL) (Hazan et al., 2007). Other methodologies for online convex optimiza- 
tion introduce a potential function (or link function) to maps solutions between the space 
of primal variables and the space of dual variables, and carry out primal-dual update 
based on the potential function. The well-known Exponentiated Gradient (EG) algo- 
rithm (Kivinen and Warmuth, 1995) or multiplicative weights algorithm (Freund and Schapire, 
1995) belong to this category. We note that these different algorithms are closely related. 
For example, in online linear optimization, the potential-based primal-dual algorithm is 
equivalent to FTRL algorithm (Hazan and Kale, 2008). All of these studies bound the 
regret by the number of trials T. 

An open problem posed in (Bianchi et al., 2005) was whether it is possible to derive a 
regret bound for an online algorithm by the variation of the observed costs. It has been 
established as a fact that the regret of a natural algorithm in a stochastic setting can be 
bounded by the total variation in the cost vectors (Hazan and Kale, 2010). Therefore, it is 
of great interest to derive a variation-based regret bound for online convex optimization in 
an adversarial setting (vs. stochastic setting). Recently (Hazan and Kale, 2008, 2010) made 
a substantial progress in this route. They proved a variation-based regret bound for online 
linear optimization by the FTRL algorithm with an appropriately chosen step size. A simi- 
lar regret bound is shown in the same paper for prediction from expert advice by modifying 
the multiplicative weighted algorithm. In this work, we aim to take one step further. Our 
goal is to develop algorithms for online convex optimization with variation-based regret 
bounds. In the remaining of this section, we first present the results from (Hazan and Kale, 
2008, 2010) for online linear optimization and discuss its potential limitations when applied 
to online convex optimization. 

1.1. Online Linear Optimization 

Many decision problems can be cast into online linear optimization problems, such as pre- 
diction from expert advice (Cesa-Bianchi and Lugosi, 2006), online shortest path prob- 
lem (Takimoto and Warmuth, 2003). (Hazan and Kale, 2008, 2010) proved the first variation- 
based regret bound for online linear optimization problems in an adversarial setting. Hazan 
and Kale's algorithm for online linear optimization is based on the framework of FTRL. 
For completeness, the algorithm is shown in Algorithm 1. At each trial, the decision vector 
H-t is given by solving the following optimization problem: 




t-i 
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Algorithm 1 Follow The Regularized Leader (FTRL) for Online Linear Optimization 
1: Input: rj > 
2: for t = 1,. . . ,r do 
3: If t = 1, predict = 

4: If t > 1, predict xt by xj = argmin f^x + ^||x||2 

xG-p ' 

5: Receive a cost vector ft and incur a loss f^^xt 

6: end for 



where ft is the cost vector received at trial t after predicting the decision x^, and 77 is a step 
size. They bound the regret by the variation of cost vectors defined as 



VART = ^||ft-/i||i, (1) 



t=i 



where /x = l/^^^^ff. By assuming ||ft||2 < l,Vt and setting rj = min(2/-v/VARr, 1/6), 
they showed that the regret of Algorithm 1 can be bounded by 



>^.T • V^fT ^/ 15VVART ifVVART>12 

> L x+ — mm > I. X < < „ , . „ . 2 

^ * xeP ^ * - \ 150 if \/VARt < 12 ^ ' 

From (2), we can see that when the variation of the cost vectors is small (less than 12), the 
regret is a constant, otherwise it is bounded by the variation O ^VVARy^ . 

1.2. Online Convex Optimization 

Online convex optimization generalizes online linear optimization by replacing linear cost 
functions with non- linear convex cost functions. It has found applications in several do- 
mains, including portfolio management (Agarwal et al., 2006), online classification (Kivinen et al., 
2004). For example, in online portfolio management problem, an investigator wants to dis- 
tribute his wealth over a set of stocks without knowing the market output in advance. If we 
let Xf denote the distribution on the stocks and r^ denote the price relative vector, i.e. rt[i] 
denote the the ratio of the closing price of stock i on day t to the closing price on day t — 1, 
then an interesting function is the logarithmic growth ratio, i.e. Ylt=i log(x7rt), which is a 
concave function need to be maximized. Similar to (Hazan and Kale, 2008, 2010), we aim 
to develop algorithms for online convex optimization with regrets bounded by the variation 
in the cost functions. Before presenting our algorithms, below we first show that directly 
applying the FTRL algorithm to general online convex optimization may not be able to 
achieve the desirable result. 

To extend FTRL for online convex optimization, a straightforward approach is to use 
the first order approximation for convex cost function, i.e., ct(x) ~ ct(xt)-|-VQ(xf)~''(x— xt), 
and replace the cost vector f^ in Algorithm 1 with the gradient of the cost function q(-) at 
xj, i.e. it = Vq(x(). Using the convexity of q(-), we have 



T T T T 

cti^t) - min q(x) < Y f^xt - min ^ f^x. 

t=i t=i t=i t=i 



(3) 
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If we assume ||Vq(x)||2 < l,Vt,Vx G "P, we can apply Hazan and Kale's variation-based 
bound in (2) to bound the regret in (3) by the variation 



T 



VARt = 



T 

E 



1 ^ 

VQ(xt)--^Vc,(xO 



To better understand VAR^' in (4), we rewrite VARj- as 



T = l 



(4) 



T 



VARt = 



t=i 



Vcti^t) - ;^ J;VC,(X0 =^Y1 ll^Ct(xt) - Verier) f 



r=l 2 t,T=l 



T T 



^ ^ E E - Vc*(x.)||2 + ^ 5; ||Vq(x.) - Vc.(x.)||i 

t=l T=l t=l T=l 

= VAR^ + VAR^. 



(5) 



We see that the variation VARj- is bounded by two parts: VAR^^ essentially measures the 
smoothness of individual cost functions, while VAR|> measures the variation in the gradients 
of cost functions. As a result, even when all the cost functions are identical, VAR^ vanishes, 
while VAR;^ still exists, and therefore the regret of the FTRL algorithm for online convex 
optimization may still be bounded by 0{\/T) regardless of the smoothness of the cost 
function. 

To address this challenge, we develop two novel algorithms for online convex optimization 
that bound the regret by the variation of cost functions. In particular, we would like to 
bound the regret of online convex optimization by the variation of cost functions defined as 
follows 



T-1 



VAR^ = ^ max ||Vct+i(x) — Vq(x) 



t=i 



(6) 



Note that the variation in (6) is defined in terms of sequential difference between individual 
cost function to its previous one, while the variation in (1) (Hazan and Kale, 2008) is defined 
in terms of total difference between individual cost vectors to their mean. Therefore we refer 
to the variation defined in (6) as sequential variation, and to the variation defined in (1) 
as total variation. It is straightforward to show that when Ci(x) = f^'^x, the sequential 
variation VAR|i defined in (6) is upper bounded by the total variation VAR^ defined in (1) 
with a constant factor: 



T-1 



T-1 



Y - f*ll2 ^ E 2||ft+i - Mlli + 2||ft - MII2 < 45^ lift - 



t=i 



t=i 



On the other hand, we can not bound the total variation by the sequential variation up to 



a constant. This is verified by the following example: fi 
• • • = = g 7^ f . The total variation in (1) is given by 



fT/2 = f and fT/2+1 



VAR7 



Eiif. 



I|2 
MII2 



t=l 



T 
2 



f + g 



T 
+ 2 



f + g 



0(T), 
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while the sequential variation defined in (6) is a constant given by 

T-l 

VARf. = \\ft+i - itWl = ||f - g||i = 0(1). 
t=l 

Based on the above analysis, we claim that the regret bound by sequential variation is 
usually tighter than by total variation. 

The remainder of the paper is organized as follows. We present in section 2 the proposed 
algorithms and the main results. In section 3, we conclude this work and discuss how to 
extend the proposed algorithms to online bandit convex optimization with a variation-based 
regret bound. 

2. Algorithms and Main Results 

Without loss of generality, we assume the decision set V is contained in a unit ball B, 
i.e., V B, and £ V (Hazan and Kale, 2008). We propose two algorithms for online 
convex optimization. The first algorithm is an improved FTRL and the second one is 
based on the mirror prox method (Nemirovski, 2005). One common feature shared by the 
two algorithms is that both of them maintain two sequences of solutions: decision vectors 
xi;^ = (xi, • • • , xy) and searching vectors zi-t = (zi, • • • , zt) that facilitate the updates of 
decision vectors. Both algorithms share almost the same regret bound except for a constant 
factor. To facilitate the discussion, besides the variation of cost functions defined in (6), we 
define another variation, named extended sequential variation, as follows 

T-l 

EVARf.(zi:T) = Yl - Vct{zt)\\l < ||Vci(zo)||i + VARf., (7) 

t=0 

where co(x) = and zq is specified in algorithms (usually is zero). When all cost functions 
are identical, VAR^ becomes zero and the extended variation EVAR^(zi:j') is reduced 
to ||Vci(zo)||2, a constant independent from the number of trials. In the sequel, we use 
the notation EVAR^ for simplicity. In this study, we assume smooth cost functions with 
Lipschtiz continuous gradients, i.e. there exists a constant L > such that 

||Vci(x) - Vq(z)||2 < L||x-z||2,Vx,z G P,Vt. (8) 

Our results show that for online convex optimization with L-smooth cost functions, the 
regrets of the proposed algorithms can be bounded as follows 

T T 

Y cti^t) - min Y (x) < O (^^/EVAR^^ + constant. (9) 

t=i ^'^'^ t=i 

Remark: We would like to emphasize that our assumption about the smoothness of cost 
functions is necessary to achieve the variation-based bound stated in (9). To see this, 
consider the special case of ci(x) = • • • = ct(x) = c(x). If the bound in (9) holds for any 
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Algorithm 2 Improved FTRL for Online Convex Optimization 
1: Input: 1] G (0, 1] 

2: Initialization: zq = and co(x) = 
3: for t = 1,. . . ,r do 

4: Predict Xt by Xt = aigmm \x^\7ct-i{zt-i) + ^||x - zt_i||| i 

5: Receive a cost function Ct(-) and incur a loss q(x() 

6: Update zt by z^ = argmin | Vc^(z^_i)Tx + ^||x||2 \ 

xe-p ' J 
7: end for 

sequence of convex functions, then for the special case where all cost functions are identical, 
we will have 

T T 

t=i ^ t=i 

implying that xt = (1/T) Ylt=i approaches the optimal solution at the rate of 0(1/T). 
This contradicts the lower complexity bound (i.e. 0(1/ VT)) for any first order optimization 
method (Nesterov, 2004, Theorem 3.2.1). 

2.1. An Improved FTRL Algorithm for Online Convex Optimization 

The improved FTRL algorithm for online convex optimization is presented in Algo- 
rithm 2. Note that in step 6, the searching vectors zj are updated according to the FTRL 
algorithm after receiving the cost function Ct(-). To understand the updating procedure for 
the decision vector x^ specified in step 4, we rewrite it as 

Xt = argmin |Q_i(zt_i) + (x - Z4_i)'^VQ_i(zt_i) + ^||x - zt_i||2) i . (10) 
xGP I 2r? J 

Notice that 

q(x) < ct{zt-i) + (x - zt_i)'^Vct(zj_i) + ^\\x- Zt-l\\l 

< Ct{zt-l) + (X - Zi_i)'^VQ(zt_i) + ^||x - Zt-l\\l, (11) 

ZT] 

where the first inequality follows the smoothness condition in (8) and the second inequal- 
ity follows from the fact i] < 1. The inequality (11) provides an upper bound for q(x) 
and therefore can be used as an approximation of ct(x) for predicting x^. However, since 
Vct(z4_i) is unknown before the prediction, we use VQ_i(zt_i) as a surrogate for Vct{zt-i), 
leading to the updating rule in (10). It is this approximation that leads to the variation 
bound. The following theorem states the regret bound of Algorithm 2. 

Theorem 1 Let ct{-),t = 1, . . . ,T be a sequence of convex functions with L-Lipschitz con- 
tinuous gradients. By setting rj = min {l, L/y^ EVAR^^, we have the following regret bound 
for Algorithm 2 

T T 

^q(x() - min^Q(x) < max ^L, ^/ EVAR^^ . 
t=i ^^'^ t=i 
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Remark: Comparing with the variation bound in (5) for the FTRL algorithm, term L 
plays the same role as VAR^ that accounts for the smoothness of cost functions, and term 
EVARy plays the same role as VAR|, that accounts for the variation in the cost functions. 
Compared to the FTRL algorithm, the key advantage of the improved FTRL algorithm 
is that the regret bound is reduced to a constant when the cost functions change only by 
a constant number of times along the horizon. Of course, the extended variation EVAR^n 
may not be known apriori for setting the optimal r/, we can apply the standard halving 
tricks (Cesa-Bianchi and Lugosi, 2006) to obtain the same order of regret bound. To prove 
Theorem 1, we first present the following lemma. 

Lemma 2 Let ct{-),t = 1, . . . ,T be a sequence of convex functions with L-Lipschitz con- 
tinuous gradients. By running Algorithm 2 over T trials, we have 



EQ(xt) < min 
xe-p 



t=i 



T-l 



t=0 



Proof We prove the inequality by induction. When T = 1, we have xi = zq = and 



mm 



+ ci(zo) + (x - zo)^Vci(zo) 
L 



2L' 



+ ^l|Vci(zo)||^ 



> ci(zo) + 7rVl|Vci(zo)||2 + min<{ 7^||x||^ + (x - zo)^Vci(zo) } = ci(zo) = ci(xi). 



2ri 



We assume the inequality holds for t and aim to prove it for t + I. To this end, we define 



L 
2r] 



|x||2 + ^ Cr(Zr-l) + (x - Z^_l)^VCr (z^-l) 



T = l 



2L 



t-i 



i(Zt 



Vct(z- 



T=0 



According to the updating procedure for zj in step 6, we have zt = argmiUxG-p V'i(x). Define 
<Pt = ^i(zt) = miuxe-p V't(x). Since ^t(x) is a (L/r/)-strongly convex function, we can have 

V't+ilx) - Vt+i(zi) > :7^l|x - ztWl + (x - zt^Vil^t+iizt) 
2r] 

= ;^||X - ztWl + (x - zt)^ (VM^t) + Vq+i(zj)) . 
Setting X = zj+i = arg miuxe-p V't+i (x) in the above inequality results in 



ipt+i{zt+i) - ipt+i{zt) = (pt+1 - {(pt + ct+i{zt) + ^||VQ+i(zt) - Vct(zt)lll) 

> ^\\zt+i - ztWl + (zj+1 - zt)^ (Viptizt) + VQ+l(zt)) 
2?7 
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where the second inequahty follows from the fact zj = arg mirixg-p tpt (x) , and therefore 
(x — zt)~^'V'ipt{zt) > 0, Vx G V. Then we have 

^t+i -4>t- ^||Vq+i(zO - VQ(zt)||i (12) 

> min |^||x - zt||2 + (x - zt)'^Vct+i(zt) + Ct^\(zt\ 



mm < 



;^||x - Zflll + (x - zt)^Vct(zj) +Q+l(zt) + (x - Zt)^(VQ+l(zt) - VCi(zt)) 



P(x) 



r{x) 



To bound the right hand side, we note that xj+i is the minimizer of /^(x) by step 4 in 
Algorithm 2, and /^(x) is a L/r/-strongly convex function, so we have 

p(x) > p(xt+i) + (x - xt+i)^Vp(xt+i) +:^||x - xt+i||i > p(Xi+i) + ^||x - xt+i||i. 
V ' 2?7 2r/ 

>o 

Then we have 

p(x) + ct+i(zt) + r(x) > p(xt+i) + ct+i(zt) + ^||x - xt+i||2 + r(x). 

zr/ 

We proceed by bounding (12) as 

^t^i -4>t- ^||VQ+i(zt) - Vct{zt)\\l 

>^||xt+i - ztWl + - zt)^Vct{zt) + Ct+l(zt) 

+ min |^||x - xt+i||| + (x - ztY {V Ct+i{zt) - Vct{zt)) 

=^\\xt+i - ztWl + {xt+i - zt)'^Vct+i{zt) + Q+i(zt) 
2rj 

+ min I ^||x - xt+i||2 + (x - xt+i)^(VQ+i(zt) - Vct{zt)) 
xev I 2rj 

>-;^||Xf+i - Zt||2 + {Xt+l - Zt)^Vct+l(zt) + Q+l(zt) 

+ min |^||x - xt+i||2 + (x - xt+i)^(Vct+i(zt) - \/ct{zt)) 

= ^l|Xf+l - ZtWl + (xt+1 - Zt)^Vct+l(zt) + Q+l(zt) - ^||VQ+i(zt) - VQ(zt)||i 

>ct+i(xi+i) - ^||VQ+i(zt) - Vctizt)\\l, 

where the first equality follows by writing (x^+i — zt)'^'S/ct{zt) = {^t+i — zt)'''VQ+i(zt) — 
(xf+i — zj)'''(VQ+i(zt) — Vct{zt)), and the last inequality follows from the smoothness con- 
dition of ct+i(x). Since (j)t > Yll=i CT(xr), we have (j)t+i > Ylt^i Cri'^r)- ■ 
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Algorithm 3 Prox Method for OnUne Convex Optimization 
1: Input: f] > 

2: Initiahzation: zq = and co(x) = 
3: for t = 1,. . . ,r do 

4: Predict xt by xt = argmin i x^VQ_i(zt_i) + ^||x - zt_i||^ i 

xg-p ' 
5: Receive a cost function q(-) and incur a loss q(x() 

6: Update zt by zt = argmin i x^Vq(x() + ^||x - zt-i\\l \ 
7: end for 



Proof [of Theorem 1] By ||x||2 < 1, Vx £ V Q B, and the convexity of q(x), we have 



mm ■ 



L.. ^ ] L ^ 

"x||2 + ^ct(zf_i) + (x - Zf_i)^Vcf(zt-i) \ < + miny^Cf(x) 



xev I 2rj ^ J 2rj xeP ^ 

Combining the above result with Lemma 2, we have 

T T 



t=l t=l 



y Q(xi) - minV q(x) < — + -^EVAR^. 



By choosing rj = min(l, L/y^EVAR|.), we have the regret bound in Theorem 1. 



2.2. A Prox Method for Online Convex Optimization 

In this subsection, we present a prox method for online convex optimization that shares the 
same order of regret bound as the improved FTRL algorithm. It is closely related to the 
prox method in (Nemirovski, 2005) by maintaining two sets of vectors xi:7^ and zi-t, where 
Xt and Zt are computed by gradient mappings using Vct-i(zt_i), and Vct(xt), respectively, 
as presented in Algorithm 3. Algorithm 3 only differs from Algorithm 2 in updating the 
searching points zt. Algorithm 2 updates zt by the FTRL scheme using all the gradients 
of the cost functions at {zt-}^~\, while Algorithm 3 updates zt by a prox method using a 
single gradient Vct(xt). It is this difference that makes it easier to extend the prox method 
to a bandit setting, which will be discussed in section 3. The following theorem states the 
regret bound of the prox method for online convex optimization. 

Theorem 3 Let Ct{-),t = 1, . . . ,T be a sequence of convex functions with L-Lipschitz con- 
tinuous gradients. By setting 7] = (1/2) min |l, L/ EVAR^"^ , we have the following regret 
bound for Algorithm 3 

T T 



^ct(xt) - min ^ct(x) < 2 max (l, -s/EVAB^ 



xeP 

t=i t=i 
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Compared to Theorem 1, the regret bound in Theorem 3 is shghtly worse by a factor of 2. 
To prove Theorem 3, we need the following lemma, which is the Lemma 3.1 in (Nemirovski, 
2005) stated in our notations. 

Lemma 4 (Lemma 3.1 (Nemirovski, 2005)) Letuj{z) be a a-strongly convex function 
with respect to the norm \\ • ||, whose dual norm is denoted by \\ ■ ||*, and D(x, z) = w(x) — 
(w(z) + (x — z)^a;'(z)) be the Bregman distance induced by function uj{'x.). Let Z be a convex 
compact set, and U Z be convex and closed. Let z G Z, 7 > 0, Consider the points, 

X = arg min7u^,^ + /^(u, z), (13) 

z+ = arg min7U^C + L'(u, z), (14) 
ueU 

then for any u £ U, we have 

7C^(x - u) < D{u,z) - D{u, z+) + ^11^ - Cll^ - ^[||x - zf + ||x - z+f ]. (15) 

a 2 

In order not to put readers in struggling with complex notations in (Nemirovski, 2005) 
for the proof of Lemma 4, we present a detailed proof in Appendix A which is an adaption 
of the original proof to our notations. 

Proof [of Theorem 3] First, we note that the two updates in step 4 and step 6 of Algorithm 3 
fit in the Lemma A if we let U = Z = V, z = zt-i, x = x^, z_|_ = zt, and u;(x) = ^||x||2, 
which is 1-strongly convex function with respect to || • II2. Then D{u,z) = ^||u — zHg. As 
a result, the two updates for Xf,zt in Algorithm 3 are exactly the updates in (13) and (14) 
with z = zt_i,7 = r]/L, ^ = \7ct-i{zt-i), and C = VQ(xt). Replacing these into (15), we 
have the following inequality, 

|(xt - z)TVQ(xt) < ^ (||Z - Zt_i||2 - ||z - ZtWl) 



+ T2 llVct(xi) - Vct_l(Zi_i)||2 - l||xt - Zi_i||i. 



Then we have 

^(ci(xt) - ct{z)) < j-{xt - z)^Vct(xt) < ^ (||z - zt-i\\l - \\z - ZtWf) 

+ ^||VQ(zt_i) - VQ_l(zt_i)||2 + ^IIVQ(Xi) - VQ(zt_i)||2 - l||xt - Zt_i||i 

1 2r?2 / 1\ 

< - (||z - zt-i\\l - l|z - ztWl) + -^||VQ(zt_i) - Vcf_i(zt_i)||^ + W - 2 ) W^t - zt-i||i, 

v ' 

<0 due to T) <l/2 

where the first inequality follows the convexity of Ct(x), and the third inequality fol- 
lows the smoothness of ct (x) . By taking the summation over t = 1 , • • • ,T with z* = 
argmin^^]^ q(z), and dividing both sides by rj/L, we have 



^Q(xt) -min J] Ci(x) < — + ^ ^ ||VQ+i(zt) - Vq(z 
t=l ^ t=l ' t=0 



10 



Regret Bound by Variation 



We complete the proof by plugging the value of rj. ■ 

Remark: Note that the prox method, together with Lemma 4 provides an easy way to 
generalize the framework based on Euclidean norm to a general norm. To be precise, let 
II • II denote a general norm, || • ||* denote its dual norm, uj{z) be a a-strongly convex function 
with respect to the norm || • || , and -D(x, z) = a;(x) — {uj{z) + (x — z)^ uj'{z)) be the Bregman 
distance induced by function a;(x). Let ct{-),t = I,-- - ,T be L-smooth functions with 
respect to norm || • ||, i.e., ||Vq(x) — Vct(z)||* < L||x — z||. Correspondingly, we define the 
extended sequential variation based on the general norm as follows: 

T-l 

EVARf,^ = ^ ||VQ+i(zi) - Vct{zt)\\l (16) 
t=o 

Algorithm 4 gives the detailed steps for the general framework. We note that the key 
differences from Algorithm 3 are: zq is set to mm^ev oj{z), and the Euclidean distances in 
steps 4 and 6 are replaced by Bregman distances, i.e., 

xt = argmin |x^VQ_i(zt_i) + -D{x,zt^i) \ , 

zt = argmin |x'^VQ(xt) + — L>(x, z^-i] 

The following theorem states the variation-based regret bound for the general norm frame- 
work, where R measure the size of V defined as ii = Y^2(maxxe-p w(x) — miuxg-p ^(x)). 



Theorem 5 Let ct{-),t = 1, . . . ,r be a sequence of convex functions whose gradients are 
L-Lipschitz continuous, u^iz) he a a-strongly convex function, both with respect to norm \\-\\, 

and EVAR^r^ he defined in (16). By setting rj = (l/2)min|^, LR/ EVAR!^^ , we have 
the following regret bound 

^ ct(xt) - min^ q(x) < 2i?max (lR/^/^, .JeVA^\ . 
t=i t=i ^ ^ 

We skip the proof since it is similar to that of Theorem 3. 
3. Conclusions and Open Problems 

In this paper, we proposed two algorithms for online convex optimization that bound the 
regret by the variation of cost functions. The first algorithm is an improvement of FTRL 
algorithm, and the second algorithm is based on the prox method. 

One open problem is how to extend the proposed algorithms to the case where the 
learner only receives partial feedback about the cost functions. One common scenario of 
partial feedback is that the learner only receives the cost c(xt) at the predicted point xj but 
without observing the entire cost function q(x). This setup is usually referred as bandit 
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Algorithm 4 General Prox Method for Online Convex Optimization 
1: Input: ry > 0,a;(z) 

2: Initialization: zq = minzg-p ct;(z) and co(x) = 
3: for t = 1,. . . ,r do 

4: Predict xj by x^ = argmin < x^VQ_i(zf_i) H D(x,zt-i' 

5: Receive a cost function q(-) and incur a loss ct(xi) 

6: Update zj by zt = argmin Ix^Vcffx^) H Z)(x, Z(_i) 

xe-p I 

7: end for 



setting, and the related online learning problem is called online bandit convex optimization. 
Many algorithms have been proposed for online bandit convex optimization with regret 
bounds stated in number of trials (Flaxman et al., 2005; Awerbuch and Kleinberg, 2004; 
Dani and Hayes, 2006; Abernethy et al., 2008). In (Hazan and Kale, 2009), the authors 
extended the FTRL algorithm to online bandit linear optimization and obtained a variation- 
based regret bound of 0{poly{d)y^YART log(T') +poly{dlog{T))), where VARt is the total 
variation of the cost vectors. The open question is how to develop algorithms for general 
online bandit convex optimization with a variation-based regret bound. Directly extending 
the proposed algorithms to the bandit setting may be difficult because they need to keep 
track of and update two sets of solutions xi:T and zi-t, and therefore it is insufficient to 
query each cost function only once. One possibility is to explore the multi-point bandit 
setting proposed in (Agarwal et al., 2010), where multiple points can be queried for each 
cost function. In Appendix B, we extend the prox method to the multi-point bandit setting 
using 0{d) queries, and prove a variation-based regret bound which is optimal when the 
variation of cost functions is independent from T. It remains as an open problem how 
to achieve a variation-based regret bound with a constant number of queries independent 
from the dimension d. Another open problem for the future work is how to reduce the 
dependence on T in the regret bound for online bandit convex optimization. 
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Appendix A: Proof of Lemma 4 

By using the definition of Bregman distance D(u, z), we can write equations (13) and (14) 



as 



X = arg min u''^ (7^ — u>'{z)) + w(u) 



z+ = arg min u''' (7C — uj'{z)) + u}{u) 



by the first oder optimality condition, we have 



(u-x)^(7e-a;'(z)+tj'(x)) > 0,Vu € ^7, 
(u-z+)T(7^- w'(z) +a;'(z+)) > 0,Vu G U. 



(17) 
(18) 



Applying (17) with u = z+ and (18) with u = x, we get 



7(x-z+)^^<(a;'(z)-a;'(x))"^(x-z+), 
7(z+ - x)"^C < {io'iz) - ^'(z+))T(z+ - x). 



Summing up the two inequalities, we have 



7(x - z+)^(^ - C) < (a;'(z+) - ^'(x))^(x - z+). 



Then 



7||e - Cll*l|x - z+ll > -7(x - z+)^(C - C) > {oj'{z+) - u;'(x))^(z+ - x) 
> a||z+ — x||^. 



(19) 
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where in the last inequahty, we use the strong convexity of uj{x). 



D{u, z) — D{u, z+) = uj{z^) — u}{z) + (u — z^)'^ oj' (z^) — (u 

=Cli(z+) - uj{z) + (u - z+)^Cli'(z+) - (u - z+)^a;'(z) - (z+ 



z)^w'(z) 



co'iz) 



=Cli(z+) - uj{z) - (z+ - z)^Cli'(z) + (u - z+)^(a;'(z+) - u}'{z)) 

--u{z+) - uj{z) - (z+ - z)^u'{z) + (u - z+)^(7C + uj'{z+) - u}'{z)) - (u - z+)'^7C 



>tj(z+) - uj{z) - (z+ - z)^w'(z) - (u - z+) ' 7C 



:a;(z+) -a;(z) - {z^ 



z)Tw'(z) 



(x-z+)"^7C+(x-u)^7C, 



where the inequahty follows from (18). We proceed by bounding e as: 

e =w(z+) - a;(z) - (z+ - z)Tw'(z) - (x - z+)'^7C 
=^(z+) - - (z+ - z)^w'(z) - (x - z+)^7(C - - (x - z 



=w(z+) - a;(z) - (z+ - z)^w'(z) - (x - z+)^7(C - (,) 

+ (z+ - x)T(7^ - uj'iz) + ^'(x)) - (z+ - x)^(^'(x) - Lo'iz)) 
>co{z+) - a;(z) - (z+ - z)Tu;'(z) - (x - z+)^7(C " 



u;(z+) - a;(z) - (x - z) ' ^'(z) - (x - z+) ' 7(C - " (z+ - x) ' c^'(x) 



x)T(,.'(x)-a;'(z)) 

T, J/ 



aj(z+) - a;(x) - (z+ - x)'^a;'(x) + a;(x) - cj(z) - (x - z) ' a;'(z) - (x - z+, ) ' 7(C - 



T, .// 



a , 



a , 



>— X — z_i_ H X 

-2" 2" 



|x-z+||||C-Cl|. 



4' 



X 



7 



+ iix-zr}-^iic-eiit 



where the first inequality follows from (17), the second inequality follows from the strong 
convexity of (^(x), and the last inequality follows from (19). Combining the above results, 
we have 



7(x - u)^C < D{u,z) - D{u,z+) + ^IIC - - ^{||x 

a 2 



z+f + ||x-zf }. 



Appendix B: A Randomized Algorithm for Online Bandit Convex 
Optimization 

In this appendix, we present a randomized algorithm for online bandit convex optimization 
with a variation-based regret bound. Besides the smoothness assumption of the cost func- 
tions, and the boundness assumption about the domain V Q B, we further assume that (i) 
there exists r < 1 such that rB V, and (ii) the cost function themselves are Lipschitz 
continuous, i.e., there exists a constant G such that |q(x) — ct(z)| < G||x— z||2, Vx, z £ V, Vt. 
To present the algorithm, we introduce a few notations. Let it denote a random index in 
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Algorithm 5 Randomized Online Bandit Convex Optimization 
1: Input: f], a, 5 > 
2: Initialization: zq = and co(x) = 
3: for t = 1,. . . ,r do 

4: Compute Xi by xj = argmin Ix^ gt-i{zt-i) + - zj_i||| 

xG(i-a)P 

5: Random sample it £ {1, • ' ' :d}. 
6: Observe Q(xt),Q(xt + (Je^J 

7: Update zt by zt = argmin ] x^gt{xt) + £||x - zj„i||^ \ 

xe(i-a)P 

8: Observe ^(zt), ct(zt + fcj), i = !,••• ,d 
9: end for 



{!,••• , d}, and 

1 

5f_i(zt_i) = -^(Q_i(zt_i +5ej) - Q_i(zt_i)) ej 

5t(xt,eiJ =;gt(xt,eij + 5't-i(zt_i) - 5t-i(zt-i, e^J 

The detailed steps are shown in Algorithm 5. We use notation gti'^t) = Qti^t^^it) short. 
It can be shown that Et[^t(xt)] = [^j(xj, ejj]. The reason to use gt{^t) rather than 
gt{xt,^it) ill updating zj is to cancel gt-i{zt-i) in updating x^. To prove the regret bound, 
we define another variation of cost functions by 

T-l 

EVAR5? = max | q+i (x) - q (x) | (20) 

i=0 

Unlike the variation defined in (7) that uses the gradient of the cost functions, the variation 
in (20) is defined according to the values of cost functions. The reason why we bound the 
regret of Algorithm 5 by the variation defined in (20) by the values of the cost functions 
rather than the one defined in (7) by the gradient of the cost functions is that in the bandit 
setting, we only have point evaluations of the cost functions. The following theorem states 
the regret bound for Algorithm 5. 



Theorem 6 Let ct{-),t = 1,...,T be a sequence of G-Lipschitz continuous convex func- 
tions with L-Lipschitz continuous gradients. By setting 6 = 



Udmax{G, ^EVAR^] 



{dL + G{l + l/r))T ' 
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S \ G \ S 

r/ = — mill ^ 1, — 7==== ^ , and a = -, we have the regret bound for Algorithm 5 by 



4d [ y^EVAB^ 

^-(Q(xt) + ct(xt + (5eiJ) 



E 



T 



mill a (x) 



t=i 



< 4^ max [G, EVARf) d {dL + G(l + 1/r)) T 



Remark: Similar to the regret bomid in (Agarwal et al. (2010), Theorem 9), Algorithm 5 
also gives the optimal regret bound 0{VT) when the variation is independent of the number 
of trials. Our regret bound has a better dependence on the dimension d (i.e., d) compared 
with the regret bound in (Agarwal et al., 2010) (i.e., d^). 

Proof Let ht{x.) = q(x) + (fft(xt) - Vct(xj))^x. It is easy seen that Vht{xt) = gti'^t)- 
Followed by Lemma 4, we have for any z E (1 — a)V 



—Vhti^t) (xf - z) < - (||z - zt-i||2 - ||z - ztlla) + ■^\\gt{:ii.t) - 5't-i(zt-i)||2 - 211^* ~ ^t-i 



''v/ii(xi)T(xi-z)<^(||z-zt_i|'2 

||2 II ii2\ -^11 ii2 

= 2 [W^-^t-ih - l|z -ztlh) - 2!!^* ~^*ll2 

+ ^Il5t(xt,eij -^t(zt_i,ejj +5i(zt_i,ejJ - 5t-i(zt-i, e^JHa 

By expanding the last term using the definitions of gt and the Lipschitz continuity of ct(-), 
we have 

^Vhti^tVi^t-z) 



1 ,,, ,,2 II ||2\ 1|| ii2 Sry^d^ii ||2 Sry^d^ , / ^ , ^ 

< 2 (11^ ~ ^t^ih - l|z - ztlh) - 2 11^* ~ ^*ll2 + ~ ^t-ih + max|Q(x) - Q_i(x) 

< 2 (11^ ~ ^t-illa - l|z -zt||2) + I 2 ) 11^* -^t-ill2 + "^2^™|cHx) -Ci_i(x)| 

< 2 (11^ ~ '^t-iWl - \\z-zt\\l) + -^^max\ct{x) - Ct_i(x)|2 

where the last inequality follows from the fact rj < 6 /{Ad). Taking summation over t = 
1, • • • , T, and by convexity of ht{x.), we have 

hti^t) - min/ii((l - a)x) < ^ + ^EVAR^? < ^ max (g, VEVAR^^) 
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Following the the proof of Theorem 8 in (Agarwal et al., 2010), we have 



E 



< E 



< E 



^Q(xt) -^q(x 

t=l 

T T 
^ht{^t) - ^/lt(x 
t=l t=l 
T T 



< E 



t=l 



+ E 



Q(xt) - ht{xt) - q(x) + ht{x) 



.t=i 



+ E 



J;(E,[^,(x,)]-Vq(xO)^(x-x,)) 



i=l 



.t=l 



t=l 



+ dL5T 



where the last inequality follows from ||x — xt|| < 2, Ei[^t(x()] = E^ [^j(x(, ejj] and the 
following inequality (Agarwal et al., 2010). 



|Et[gt(xt,eiJ]-VQ(xO]||2 < 



dL5 



Then we have 

T 



E 



^ 1 

^-(Q(xt) + ct(xt + (5eiJ) 



t=i 



mm 



t=i 



Ad 



< — max [G, yEYARp 
+ 5dLT + 5Gr + aGT 



Plugging the stated values of 5 and a completes the proof. 
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